STK-B401 

The Affordable 
Mini-Supercomputer 
with Muscle 




SUPERTEK COMPUTERS 





— The Affordable Mini-Supercomputer with Muscle 


STK-B4D1 

The STK-6401 is a high-performance 
mini-supercomputer which is fully 
compatible with the Cray X-MP/48™ 
instruction set, including some 
important operations not found 
in the low-end Cray machines. 

The system design combines 
advanced TTL/CMOS technologies 
with a highly optimized architecture. 
By taking advantage of mature, 
multiple-sourced off-the-shelf devices, 
the STK-6401 offers performance 
equal to or better than comparable 
mini-supers at approximately half 
their cost. 

Additional benefits of this design 
approach are much smaller size, 
low power consumption, the ability 
to operate with fan cooling, and 
intrinsic high reliability. 

Central Processing Unit 

The STK-6401 architecture is based 
on five major, tightly-coupled sub- 
systems: Instruction Unit, Vector Unit, 
Scalar Unit, Memory Unit, and I/O 
Processor. This structure yields a 
peak computational rate of 40 MFLOPS 
and high throughputs for a wide 
range of applications with various 
degrees of vectorizability or inherent 
parallelism. 

The Instruction Unit executes the 
Cray X-MP instruction set, enabling 
programs currently running on a Cray 
to be used without change on the 
STK-6401. 

The Vector Unit contains a multi- 
ported vector register file which 
supports as many as 16 word 
transfers per clock cycle — with a 
bandwidth of 2.56 GB/s. It can fully 
support all concurrent vector opera- 
tions as well as vector-memory and 
vector-scalar data transfers. Hence, 
peak or near-optimal vector perfor- 
mance can readily be sustained in 
most applications. 

The Scalar Unit contains a multi- 
ported scalar register file that sup- 
ports simultaneous scalar operations 
with low latencies. Its 20-MIPS peak 
performance for 64-bit scalar opera- 
tions, supported by the Instruction 
Unit which issues instructions at the 
maximum rate of one per cycle, 
makes the STK-6401 suitable for 
many scalar oriented applications. 

Central Memory 

The STK-6401 Memory Unit serves 
the other major subsystems at very 
high data transfer rates. Its 4-ported 


design supports two vector reads, one 
vector write, and one I/O transfer 
with an aggregate bandwidth of 
640 MB/s. Bank conflicts are reduced 
to a minimum by a 16-way, fully 
interleaved structure. 

Coupled with the multi-ported vec- 
tor register file and a built-in vector 
chaining capability, the Memory Unit 
makes most vector operations run as 
if they were efficient memory-to- 
memory operations. This important 
feature is not offered in most of the 
machines on the market today. 

I/O Subsystem 

The I/O subsystem of the STK-6401 
communicates with central memory 
via a high-speed port which is trans- 
parent to CPU operation. This port 
has a bandwidth of 160 MB/sec and 
is available to multiple data paths 
with individual bandwidths of up to 
50 MB/sec. Controllers, based on the 
VMEbus, manage the data flow 
associated with these channels. 

The flexibility of this approach 
enables very high density disk drives 
to be interfaced easily to the STK-6401, 
allowing accommodation of new drives 
as they become available. Currently 
both 2.4 MB/sec and 12.5 MB/sec 
drives are offered. 

High-performance magnetic tape 
units, terminals, and networking via 
both Ethernet (TCP/IP) and HYPER- 
channel™ are supported. 

The STK-640Ts I/O structure also 
makes high bandwidth channels 
available for customer-specific I/O. 

Productive Software Environment 

The Cray Time Sharing System 
(CTSS) was specifically designed to 
give computational scientists and 
applications developers a highly- 
productive, interactive, supercom- 
puting environment. 

Large, complex computational 
models can be developed rapidly and 
efficiently using CTSS' broad range 
of facilities — including advanced 
text editing, powerful symbolic 
debugging, fast turnaround of testing, 
and interaction with long running 
codes. 

The STK-640rs sophisticated 
FORTRAN applications environment, 
coupled with bit-for-bit instruction 
compatibility with the Cray X-MP, 
lets users retain their current 
FORTRAN applications interface 
while also taking advantage of the 


more than 300 third-party and public 
domain applications developed for 
the Cray 1™ and Cray X-MP 
architectures. 

A UNIX™ environment is also 
available under CTSS. 

Concurrent Interactive and 
Batch Processing 

CTSS accommodates the differing 
requirements of applications develop- 
ment and long-running, computa- 
tionally intensive codes. As a result, 
concurrent interactive and batch 
access to the STK-6401 is supported 
with no degradation in system per- 
formance. CTSS manages multiple 
concurrent processes for efficient 
sharing of the STK-6401 s resources. 
User-controlled preemptive priority 
scheduling allows users to control 
resource allocation and system 
workload, for optimal use of the 
STK-6401. 

To simplify the user s interface with 
the system, CTSS provides a single 
command language for both interac- 
tive and batch access; the batch job 
manager — COSMOS — accepts a 
directives file containing commands 
in the same form as would be used 
interactively. 

Program Recovery/ Restart Facility 

To aid recovery and restart, CTSS 
writes a running program s memory 
image to a file (called a dropfile) 
whenever the program is temporarily 
removed from memory or terminates 
abnormally. CTSS creates the dropfile 
in the user's directory (where it is 
accessible to the user) as part of the 
setup for program execution. 

When a program terminates abnor- 
mally, the dropfile receives the pro- 
gram's memory image together with 
all the system information needed to 
restart execution from the point at 
which it was interrupted. Since the 
dropfile is itself an executable file, 
the program may be recovered simply 
by executing the dropfile. 

High-Performance Disk I/O 

Many engineering and scientific 
applications require very high disk 
I/O bandwidth to properly support 
their computational workload. 

CTSS has been designed to support 
exceptionally high I/O rates via a 
combination of features: file system 
overhead is reduced through a stream- 
lined, optimized disk file index struc- 
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ture; disk positioning overhead is 
minimized by allocating to new files 
the largest possible blocks of con- 
tiguous disk space; I/O transfers are 
optimized by moving data in multi- 
ples of disk sectors (512 64-bit words); 
operating system I/O processing 
overhead is substantially lowered by 
performing all I/O processing tasks 
in an intelligent lOP subsystem. 

Further, CTSS and the FORTRAN 
Run-Time Library fully support 
asynchronous I/O, thus enabling 
applications to take advantage of 
computational and I/O overlap. 

FORTRAN Applications 
Development Environment 

CTSS provides an efficient FORTRAN 
environment for the applications pro- 
grammer through a powerful vec- 
torizing compiler, scientific libraries, 
and dynamic debugging. 


Vectorizing Compiler: The Cray 
FORTRAN Compiler (CFT) is an 
optimizing, vectorizing compiler that 
supports language and library 
enhancements for vector processing. 
Existing FORTRAN applications pro- 
grams can therefore take full advan- 
tage of the STK-640Ts outstanding 
vector performance. 

CFT enhancements also support 
other manufacturers' extensions to the 
ANSI '77 FORTRAN standard, such 
as the VMS™ FORTRAN extensions. 
Consequently CFT assists the pro- 
grammer's productivity and maximizes 
software execution speed and 
portability. 

Scientific Libraries: Under CTSS four 
system libraries are available to the 
applications developer. The library 
interface is optimized to achieve max- 
imum program performance without 
the programmer having to be con- 


cerned with underlying system and 
hardware dependencies. 

MATHLIB and OMNILIB provide 
optimized and vectorized basic math- 
ematical functions and high-level 
mathematics and scientific routines, 
including the Basic Linear Algebra 
(BLAS) routines. FORTLIB and 
CFTLIB furnish optimized systems 
support routines, including high- 
performance, asynchronous 
FORTRAN I/O. 

Dynamic Symbolic Debugging: Under 
CTSS the applications developer has 
a powerful and convenient means of 
troubleshooting code — the Dynamic 
Debug Tool (DDT). Since CTSS 
allows a program to directly control 
the execution of another, DDT may 
be used on a program's dropfile for 
debugging or for post-mortem dumps 
without having to recompile or relink 
the applications program. 





Hardware Specifications 

Architecture 

— Full Cray X-MP/48 instruction set. 

— Hardware support for scatter/gather, compressed 
index, and enhanced addressing mode. 

Computation Rate 

— 40 MFIOPS peak vector performance. 

— 20 MIPS peak scalar performance. 

Central Memory 

— 640 MB/s aggregate bandwidth. 

— 160 MB/s bandwidth to I/O. 

— Up to 128 MBytes of storage. 

— Error detection /correction (SECDED). 

Vector Registers (64-Bit) 

— 8 64-word registers. 

— 2.56 GB/s aggregate bandwidth. 

Scalar Registers (64-Bit) 

— 8 registers (S). 

— 64 buffer registers (T). 

Address Registers (24-Bit) 

— 8 address registers (A). 

— 64 buffer registers (B). 

Functional Units 

— 13 functional units. 

— Concurrent operation. 

— Floating point, integer, logical operations for 
vector, scalar, and address operands. 

Input/Output 

— I/O subsystem supports terminals, tape, disk, 
and networking. 

— Disk bandwidth in multiples of 12.5 MB/s. 
Specifications subject to change without notice. 


Software Specifications 

CTSS Operating System 

— Interactive/batch. 

— Multi-user. 

— Hierarchical file system. 

— Process priority levels. 

— Interprocess communication. 

— Windowing capability. 

— UNIX™ environment. 

FORTRAN Applications 
Development Environment 

CFT (CRAY FORTRAN compiler) 

— ANSI 77. 

— Scalar optimization. 

— Automatic vectorization. 

— VMS™ FORTRAN extensions. 

DDT (Dynamic Debug Tool) 

— Interactive symbolic debugging without 
requiring code recompilation. 

— User may specify execution breakpoints and trace- 
points, and examine and alter values of variables. 

LDR (Loader) 

— Run-time code segmentation. 

UPDATE (Source code control) 

— Source code management librarian. 

— Audit trail of code changes. 

— Reversibility of changes. 

LIB (Object and source code control) 

— Mix source, data, object, and binary in a single 
library. 

Math/Science Libraries 

— Optimized for maximum run-time performance. 

Reference to use of CTSS and CIVIC on the STK-6401 does not imply endorsement by 
the U.S. Government or the University of California. 

Reference to CFT, a product of Cray Research Inc, does not imply endorsement by CRI. 
Cray 1 and Cray X-MP are trademarks of Cray Research Inc. 

UNIX is a trademark of AT&T. 

VMS is a trademark of Digital Equipment Corp. 

HYPERchannel is a trademark of Network Systems Corp. 

Printed in U.S. A. 4/87. 
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The Supertek S-1 Mini-Supercomputer 


The S-1 is a high-performance mini- 
supercomputer which is fully compat- 
ible with the Cray X-MP/416™ 
instruction set, including some 
important operations not found in the 
early Cray machines. 

The system design combines 
advanced TTL/CMOS technologies 
with a highly optimized architecture. 
By taking advantage of mature, 
multiple-sourced off-the-shelf devices, 
the S-1 offers performance higher 
than its competitors at a substantially 
lower price. 

Additional benefits of this design 
approach are much smaller size, low 
power consumption, the ability to 
operate with fan cooling, and intrinsic 
high reliability. 

Central Processing Unit 

The S-1 architecture is based on five 
major, tightly-coupled subsystems: 
Instruction Unit, Vector Unit, Scalar 
Unit, Memory Unit, and I/O Subsys- 
tem. This structure yields a peak 
computational rate of 40 MFLOPS 
and high throughput for a wide range 
of applications with various degrees of 
vectorizability or inherent parallelism. 

The Instruction Unit executes the 
Cray X-MP instruction set, enabling 
programs currently running on a Cray 
to be used without modification on 
the S-1. 

The Vector Unit contains a multi- 
ported vector register file which 
supports as many as 16 word transfers 
per clock cycle — with a bandwidth 
of2.56GB/s. It can fully support all 
concurrent vector operations includ- 
ing vector-memory and vector-scalar 
data transfers. Hence, peak or near- 
optimal vector performance can be 
readily sustained in most applications. 


The Scalar Unit contains a multi- 
ported scalar register file that sup- 
ports simultaneous scalar operations 
^\dth low latencies. With 20-MIPS 
peak performance for 64-bit scalar 
operations, and an Instruction Unit 
which can issue instructions at the 
rate of one pei: cycle, the S-1 provides 
robust scalar processing. 

The S-1 Memory Unit serves the 
other major subsystems at very high 
data transfer rates. Its 4-ported 
memory design has an aggregrate 
bandwidth of 640 MB/s and supports 
two vector reads, one vector write, 
and one I/O transfer. The memory’s 
16-way, fully interleaved structure 
reduces bank conflicts to a minimum. 
Coupled with the multi-ported vector 
register file and a built-in vector 
chaining capability, the Memory Unit 
makes most vector operations run as 
if they were efficient register-to- 
register operations. 

I/O Subsystem 

The S-1 Series I/O Subsystem (lOS) 
is comprised of multiple I/O Modules 
(lOMs) each based on the industry- 
standard VME bus. The S-1 takes 
full advantage of this architecture by 
distributing operating system func- 
tions across the central processing 
unit and the multiple lOMs. 

The lOMs connect to the S-1 via 
the high-speed 160 MB/second 
channel. Each lOM is controlled by a 
Master I/O Processor, and contains 
slots for peripheral I/O processors. 
The Master I/O Processor is driven 
by a real-time, event-driven operating 
system (RTIOS™) that processes 
external interrupts and the Central 
Processor I/O requests. 


and executes peripheral driver 
routines. The central operating 
system and the lOMs communicate 
via messages and queues. By thus 
shifting the peripheral processing 
burden to the I/O Subsystem, the 
central processor is free for high 
performance compulation. 

The intelligent peripheral I/O 
Processors in each I/O Module 
control various peripheral devices. 
Included are high speed disks, tapes, 
printers, terminals, and network 
interfaces. This provides full, stand 
alone functionality for the S-1 as well 
as networking connectivity. 

Reliability-Availability-Serviceability 

Each S-1 system incorporates sophis- 
ticated features to support a well 
conceived Reliability/Availability/ 
Serviceability program. 

The Master lOP supports an 
independent Service Processor 
(SP) which controls the S-l’s ad- 
vanced diagnostic subsystem, the 
central operating system “bootstrap”, 
and S-1 CPU intialization. The SP 
also maintains a log of the system’s 
detected and corrected errors. A self- 
contained subsystem, the SP 
includes its own processor and local 
memory, an 800 MByte disk, cartridge 
tape drive, and communications ports 
for the operator’s console and for 
remote diagnosis. 

The SP can set and examine the 
state of internal registers and step the 
functional units through execution 
cycles using the independent 
diagnostic Scanbus. This approach 
provides quick fault detection with a 
high level of confidence. 


SUPERTEK 
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Hardware Specifications 


Architecture 

— Full Cray X-MP/416 instruction 
set. 

— Hardware support for scatter/ 
gather, compressed index, and 
enhanced addressing mode. 

Computation Rate 

— 40 MFLOPS peak vector 
performance. 

— 20 MIPS peak scalar perform- 
ance. 

Central Memory 

— 640 MB/s aggregate bandwidth. 

— 160 MB/s bandwidth to I/O. 

— Up to 128 MBytes of storage. 

— Error detection/correction 
(SECDED). 

Vector Registers (64-Bit) 

— Eight 64-word registers. 

— 2.56 GB/s aggregate bandwidth. 


Scalar Registers (64-Bit) 

— 8 registers (S). 

— 64 buffer registers (T). 

Address Registers (24-Bit) 

— 8 address registers (A). 

— 64 buffer registers (B). 

Functional Units 

— 13 functional units. 

— Concurrent operation. 

— Floating point, integer, logical 
operations for vector, scalar, and 
address operands. 

Input/Output 

— I/O subsystem supports termi- 
nals tapes, disks, printers, and 
networking. 

— Disk bandwidth 2.4 MB/s 
standard; Optional high-speed 
disks in multiples of 12.5 MB/s. 

Cfoy X- UP is 0 fegislefed Irodemork of Ooy RpsfOfch. Inc. 

Spfcificolions subjecl lo chongc wilhoul notice. 
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S-1 - UNIX™ OPERATING SYSTEM 


Supertek UNIX™ with Supercom- 
puting Extensions 

Supertek UNIX™ sets a new standard 
for ease of use and efficiency among 
supercomputer operating systems. 
Designed specifically for the S-1 series 
of mini-supercomputers, Supertek 
UNIX provides the optimal computing 
environment for en^eering and 
scientific users. 

Derived from AT&T UNIX System 
V, Supertek UNIX provides extensive 
functionality specifically designed to 
support the broad range of applications 
in the scientific computing environment. 
By combining the famihar and proven 
timesharing capabilities of UNIX with 
Supertek designed extensions to support 
large-scale, performance intensive 
scientific computing environment, 
Supertek UNIX creates an outstanding 
environment for interactive applications 
development as well as for long running, 
large production jobs. 

Supercomputing features added to 
UNIX by Supertek include multi-stream 
batch processing, asynchronous disk 
I/O, a new user-specified priority 
scheme, a highly vectorized applications 
and system runtime environment, 
resource and job accounting facilities, a 
process restart and recovery capability 
for long running production applications, 
and a channel-based I/O interface with 
multiple, independent, intelligent I/O 
processors. 


Software Specifications 


UNIX™ Operating System 

AT&T SYSTEM V/IEEE POSIX Standard 

Supercomputing extensions. 

— Distributed I/O Subsystem. 

— Interactive & Batch Access. 

— Process & Job Recovery. 

— High Performance I/O. 

— User Specified Process Priority Levels. 

Multi-user environment. 

Hierarchical file System. 

Interprocess communication. 

Windowing capability. 

FORTRAN Applications 
Development Environment 

sft (Supertek FORTRAN compiler) 

— ANSI ’77. 

— Scalar optimization. 

— Automatic vectorization. 

VMS™ FORTRAN extensions. 

ddt (Dynamic Debug Tool) 

— Interactive, Source-level, Symbolic debugging without 
code recompilation. 

— User may specify execution breakpoints/tr ace- 
points, and examine and alter values of variables. 

upd (Source code control) 

— Source code management librarian 

— Audit trail of code changes. 

— Reversibility of changes. 

Math/Science Libraries 

— Optimized for maximum run-time performance. 


UNIX is 0 Irodemork of AljcT. 

VMS is 0 Irodcmork of OiqHol Equipmcnl Coipofolion. 
Spccificolions ore subject to chongc without notice 
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Features: 

• 64 Bit Scientific Minisupercomputer 

• Cray XMP/416 Instruction Set Compatible 

• 40 MFLOPS Peak Vector Performance 

• 20 MIPS Peak Scalar Performance 

• 1,2,4,8,16 MW Memory Configurations 

• Four Ported Memory 




Sarvica Procaaaor Unit 
With RTA 

0 

2.4 MB/a Tranafar Rata 
800 MB Capacity 
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12.S MB/a Tranafar Rata 
680 MB Capacity 


200 IPS 
1600/6250 BPI 


64 Tarmlnala 
18.6 K Baud 


600 LPM 
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